Learning Algorithms for Markov Decision Processes with Average Cost

نویسندگان

  • Jinane Abounadi
  • Dimitri P. Bertsekas
  • Vivek S. Borkar
چکیده

This paper gives the first rigorous convergence analysis of analogs of Watkins’ Q-learning algorithm, applied to average cost control of finite-state Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recursively computing the value function of average cost problem the traditional relative value iteration algorithm and a recent algorithm of Bertsekas based on the stochastic shortest path (SSP) formulation of the problem. Both synchronous and asynchronous implementations are considered and are analysed using the “ODE” method. This involves establishing asymptotic stability of associated ODE limits. The SSP algorithm also uses ideas from two time scale stochastic approximation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Title of dissertation : LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES

Title of dissertation: LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES Abraham Thomas, Doctor of Philosophy, 2009 Dissertation directed by: Professor Steven Marcus Department of Electrical and Computer Engineering We propose various computational schemes for solving Partially Observable Markov Decision Processes with the finite stage additive cost and infinite horizon discounted cost criterio...

متن کامل

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over cor...

متن کامل

Semi-Markov decision problems and performance sensitivity analysis

Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop PA theory for semi-Markov processes (S...

متن کامل

Simulation-Based Algorithms for Average Cost Markov Decision Processes

In this paper, we give a summary of recent development of simulation-based algorithms for average cost MDP problems, which are different from those for discounted cost problems or shortest path problems. We introduce both simulation-based policy iteration algorithms and simulation-based value iteration algorithms for average cost problem, and give the pros and cons of each algorithm.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Control and Optimization

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2001